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FOREWORD 


The  digital  signal  processor  design  discussed  in  this  document  was  developed 
at  Honeywell’s  Applied  Research  Department  of  the  Systems  and  Research 
Division  by  Mr.  Robert  Berg  and  Dr.  Larry  Kinney  of  the  Computer  Techniques 
section.  Mr.  Ferdinand  Ohnsorg  developed  the  Fast  Walsh  Transform,  and 
Dr.  M.  Geokezas  did  the  accuracy  analysis  and  application  studies.  Both  men 
are  in  the  Information  Processing  section. 

This  development  effort  has  been  sponsored  by  the  Honeywell  Research  Depart- 
ment and  by  the  Honeywell  Ordnance  Division,  whose  support  and  encourage- 
ment is  gratefully  acknowledged. 


George  Swanlund 
Principal  Staff  Scientist 
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INTRODUCTION 


Digital  signal  processing  is  being  used  to  perforin  an  ever-increasing  share 
of  signal  processing  and  spectral  analysis  tasks  in  both  scientific  and  opera- 
tional disciplines.  This  is  because  digital  processing  has  advantages  not 
available  in  conventional  analog  techniques: 

• Output  is  compatible  with  digital  equipment  used  in  subsequent 
computations 

• Very-low-frequency  signals  are  processed  efficiently  with  much 
smaller  equipment 

• Insensitivity  to  environmental  conditions  or  changes 

• Operational  stability 

• A complete  set  of  operating  modes  is  available  in  one  unit 

• Smaller,  lighter,  less  expensive,  more  reliable,  more 
maintainable 

• Can  time -share  devices  to  service  a number  of  inputs 

• Accurate 

The  implementation  of  digital  processors  has  advanced  quite  rapidly  in  recent 
years  because  of  two  key  developments: 

• Fast  Fourier  and  Fast  Walsh  algorithms  can  now  be  used  for 
frequency  transforms 

• Integrated  circuitry  now  permits  low-cost,  special-purpose 
computers 
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Computational  speed  has  increased  dramatically  because  of  these  fast  trans- 

o 

forms.  Directly  implementing  a Fourier  transform  requires  N computations, 
where  N is  the  number  of  discrete  samples.  The  fast  transform  reduces  the 
number  of  computations  required  to  N loggN.  As  an  example,  when  N = 512, 
the  computations  are  reduced  from  262,  144  to  4,  608,  or  by  a factor  of  57. 

The  advent  of  large-scale  integrated  circuits  (LSICs)  permits  the  economic 
realization  of  parallel  arrays  of  processing  modules.  An  array  of  512  arith- 
metic units  can  further  reduce  processing  time  by  a factor  of  512.  Each 
arithmetic  unit  performs  only  nine  computations  and  the  transform  can  be 
performed  in  real  time.  Furthermore,  incorporating  microprogramming 
into  each  module  allows  a variety  of  processing  mode  uses,  which  combines 
the  flexibility  and  speed  of  special-purpose  computation. 

Honeywell's  Digital  Signal  Processor  (DISP)  incorporates  these  new  algorithms 
into  parallel  arrays  of  identical  processing  modules.  Each  module  consists  of 
two  arithmetic  units  fabricated  on  one  LSIC  chip,  resulting  in  a processor 
whose  size,  weight,  cost,  power  dissipation,  and  reliability  are  particularly 
appropriate  for: 

• General  laboratory  computations 

• Real-time  simulations 

• Operational  hardware 

• Portable  test  equipment 


(either  stand  alone  or  tied 
in  to  a computer  facility) 
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SECTION  I 
SUMMARY 


The  DIgitial  Signal  Processor  (DISP)  is  comprised  of  five  main  units  (Fig- 
ure 1-1): 

1.  An  expandable  array  of  N identical  processing  modules,  where 
each  module  performs  identical  computations  simultaneously 

2.  2N  shift  registers 

3.  Control  unit 

4.  Input  buffer 

5.  Output  buffer 

Each  processing  module  is,  in  effect,  a small  microprogrammed  computer 
with  its  own  input  and  output  registers,  memory,  arithmetic  section  and 
instruction  repertiore.  All  input  and  output  data  are  represented  in  12 -bit* 
fractional  2s  complement  format.  The  arithmetic  portion  of  the  processing 
module  can: 

1.  Add 

2.  Subtract 

3.  Multiply  (simple  and  complex) 

The  instruction  repertoire  permits  selecting  a complete  set  of  signal  processing 
modes.  (These  modes  are  discussed  in  detail  in  Section  II.)  DISP  is  switched 
to  a new  mode  simply  by  a control  command. 


* 12  bits  is  a nominal  value.  The  number  of  bits  is  optional. 
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SHIFT 

REGISTERS  PROCESSING  MODULES 


FFT-FWT  SAMPLE  SIZE  = 2N 
DIGITAL  FILTERS  = N 


Figure  1-1.  Block  Diagram  of  a Digital  Signal  Processor 
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The  unit  scales  automatically  to  maintain  full  dynamic  range.  Any  arithmetic 
overflow  is  detected  by  the  module  in  which  it  occurs.  The  module  notifies  the 
control  section  that  overflow  has  occurred,  and  the  control  unit  issues  a 
command,  correcting  the  overflow  condition  and  properly  scaling  the  data 
in  all  modules. 


Each  processing  module  is  fabricated  on  one  identical  LSIC  using  bipolar  - 
compatible  Metal  Oxide  Semiconductor  (MOS)  technology.  Each  module  performs 
serial  arithmetic  at  a bit  cycle  of  1 /usee  or  less.  Since  the  data  word  used  in 
DISP  is  12  bits  long  and  requires  overflow  detection  and  correction,  a word  time 
consists  of  13 -bit  times  (13  /usee). 

Shift  registers  are  required  to  store  the  weighting  factors  (W^  of  the  Fast 
Fourier  Transform  (FFT),  and  to  store  the  constants  of  the  digital  filter.  The 
length  of  the  shift  registers  increase  as  the  log2  N to  accommodate  the  FFT 
mode  (every  time  N is  doubled,  another  stage  is  added  in  the  FFT  algorithm). 

To  illustrate  the  physical  characteristics  of  a DISP,  the  following  estimates 
are  made  for  a DISP  containing  256  processing  modules: 


Description 

Size 
(cu  ft) 

Weight 

(lbs) 

Power 

(watts) 

Standard  packaging 
Miniaturized  packaging 

1.  0 

0.  1 

40 

10 

100 

80 

DISP  can  be  tied  into  a GP  computer  (Figure  1-2)  or  it  can  stand  alone  in 
real-time  simulations  or  on-line  in  an  operational  system.  Each  processing 
module  has  buffer  registers  internal  to  the  module.  Various  groupings  of 
these  registers  allow  output  data  to  be  transmitted  at  a rate  compatible  with 
a wide  range  of  If  O devices. 
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Figure  1-2.  DISP-GP  Computer  Tie-In 
DISP  VERSUS  OTHER  DIGITAL  PROCESSORS 

Appendix  A consists  of  a chart  from  the  IEEE  Transactions  on  Audio  and 
Electroacoustics,  Vol.  AU-17,  No.  2,  June  1969,  entitled,  "FFT  Hardware 
Implementations  - A Survey",  by  Glen  D.  Ber gland.  The  Honeywell  DISP 
capabilities  have  been  added  to  this  chart.  DISP  matches  or  exceeds  the 
capabilities  of  all  other  units.  In  addition,  the  size,  weight  and  power  of 
DISP  are  smaller  than  for  any  of  the  other  equipment  described. 


SUMMARY  OF  REST  OF  DOCUMENT 

Section  II  discusses  the  set  of  operating  modes  which  are  available.  Some  of 
the  complex  modes  and  systems  applications  require  a tie-in  to  a GP  com- 
puter. 


1D-C-1 


1-5 


Section  III  presents  the  accuracy  results  of  DISP  operating  as  an  FFT  and  a 
bank  of  filters.  This  analysis  establishes  that  a 12 -bit  unit  will  be  adequate 
for  the  majority  of  applications. 

Section  IV  presents: 

1.  A description  of  the  DISP  system  organization 

2.  A detailed  description  of  the  design  of  the  processing  module 
and  how  it  operates  in  the  system 

3.  The  functions  of  the  DISP  control  unit 
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SECTION  II 
OPERATING  MODES 


The  DISP  has  three  levels  of  operating  modes: 

1.  The  basic  modes  consist  of  single  operations  such  as  a Fast 
Fourier  Transform  (FFT)  or  a multiplication  of  two  functions. 

2.  The  complex  modes  consist  of  two  or  more  basic  modes,  e.  g. , the 
power  spectrum  mode  includes  a Fast  Fourier  Transform  and 

a subsequent  squaring  of  the  frequency  coefficients. 

3.  The  functional  modes  consist  of  some  specific  signal  processing 
application.  Some  of  the  application  modes  can  be  performed 
entirely  within  DISP,  while  others  assume  additional  external 
processing.  The  functional  modes  shown  are  not  exhaustive  and 
serve  mainly  to  illustrate  typical  applications. 


BASIC  MODES 

The  list  of  basic  modes  and  their  execution  times  are  given  in  Table  2-1. 
A brief  discussion  of  each  mode  is  given  below. 


Fast  Fourier  Transform  (FFT ) 


The  Fourier  Transform  is  based  on  sine  and  cosine  functions  and  is  used 
effectively  for  spectral  analysis  of  real  or  complex  inputs.  The  Walsh 
transform  of  real  inputs  is  based  on  rectangular  functions  analogous  to 
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Table  2-1.  Basic  Modes 


PROCESSING  TIMES,  msecs. 


MODES 

256  POINT 

512  POINT 

1. 

FAST  FOURIER  TRANSFORM, 

FFT 

1.118 

1.274 

a)  INVERSE  FFT, 

IFFT 

1.118 

1.274 

2. 

FAST  WALSH  TRANSFORM, 

FWT 

.104 

.117 

a)  INVERSE  FWT, 

IFWT 

.104 

.117 

b)  FWT  - COMPLEX  INPUTS 

FWT  (C) 

.117 

.130 

0 IFWT  - COMPLEX  INPUTS 

IFWT  (C) 

.117 

.130 

3. 

TIME  WINDOW  WEIGHTING  (COMPLEX) 

W 

.299 

.299 

4. 

SQUARE  COMPLEX  FUNCTION 

SQ 

.364 

.364 

5. 

MULTIPLY  TWO  FUNCTIONS  (COMPLEX) 

MPLY 

.400 

.400 

6.  DIGITAL  FILTER  BANK  - 2nd  ORDER 

DFB 

.468 

.468 

a) 

DF.B  4th  ORDER 

DFB  (4) 

.962 

.962 

b) 

DFB  6th  ORDER 

DFB  (6) 

1.443 

1.443 

c) 

DFB  1st  ORDER  (LOW  PASS  FILTER) 

LPF 

.351 

.351 
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hard-clipped  sine  and  cosine  functions.  The  Walsh  transform  of  complex 
inputs  is  based  on  rectangular  functions  analogous  to  the  hard-clipped 
exponential  representation  of  the  sinusoids. 

The  DISP  easily  computes  these  three  transforms  because  all  use  the  same 
computational  flow  algorithm,  although  requiring  different  weighting 
coefficients.  (This  algorithm  is  shown  in  Figure  2-1  for  a complex  FFT  of 
eight  input  samples.) 

The  unique  feature  of  the  algorithm  (Figure  2-1)  is  that  each  of  the  k columns 
(N  = 2 ) requires  identical  computations,  and  combines  the  same  samples  to 
derive  a new  sample. 

A solid  line  to  a node  represents  addition,  a dashed  line  subtraction,  and  W. 
a complex  multiplication.  The  W^’s  represent  complex  weighting  factors 
because  the  Fourier  transform  has  a sinusoidal  basis  function: 


= cos 


2iri 

N 


j sin 


2ffi 

N 


The  operations  performed  in  a single  module  are  shown  in  Figure  2-2.  Each 
module  is  time  shared  over  all  k columns  or  stages. 


Note  that  the  algorithm  does  not  produce  the  Fourier  coefficients  in  their 
natural  order.  The  output  order  can  be  found  by  first  numbering  the  outputs 
in  natural  order  using  binary  numbers,  then  reversing  the  order  of  the  digits 
of  the  binary  numbers  and  interpreting  the  resulting  number  as  the  number  of 
the  Fourier  coefficient. 


1D-C-1 


INPUT 


STAGES 


Figure  2-1.  Algorithm  for  the  Fast  Fourier  Transform  of 
Eight  Input  Samples 


OUTPUT 
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Figure  2-2.  Module  Operation:  FFT  Mode 
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Fast  Walsh  Transform  (FWT) 


Since  the  Walsh  transform  of  real  samples  is  based  on  rectangular  functions, 
the  only  weighting  coefficients  are  plus  and  minus  one.  These  coefficients 
are  processed  by  addition  and  subtraction.  The  combinational  algorithm  of 
the  FWT  is  identical  to  Figure  2-1,  if  all  of  the  W.  terms  are  removed. 

Since  FWT  computations  require  no  multiplications,  they  are  performed 
much  faster  than  FFT  with  the  same  number  of  discrete  data  samples. 


The  complex  Walsh  transform  algorithm  requires  multiplying  certain  data 
values  by  the  value  -j  (j  = V-"l)  through  internally  complementing  the  real 
portion  of  the  data  and  interchanging  the  real  and  imaginary  parts.  The 
algorithm  for  the  complex  FWT  is  the  same  as  that  in  Figure  2-1  if  all  values 
of  Wn/4  are  replaced  by  -j,  and  all  other  W.’s  removed. 

For  all  FWT  algorithms,  outputs  are  ordered  differently  than  shown  in 
Figure  2-1.  The  FWT  output  order  can  be  found  by  numbering  the  outputs  in 
binary,  reversing  the  digits  of  the  binary  numbers,  and  interpretating  the 
resulting  digits  as  the  Gray  code  for  the  number  of  the  FWT  coefficient.  For 
n = 8,  the  output  order  starting  at  the  top  of  Figure  2-1  is  hQ,  h^,  hg,  h^,  h^ 
hg,  h-2  and  hg. 


Time  Window  Weighting,  W 

In  some  cases,  it  is  desired  to  shape  the  time  representation  of  the  data  to 
achieve  a more  desireable  frequency  function.  In  the  cases  of  coherent 
detection,  multiplication  by  a reference  function  is  desired.  In  both  of  these 
cases,  either  real  or  complex  functions  are  involved  for  both  the  input  and 
the  time  window  weighting  function. 
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The  time  window  weighting  is  accomplished  by  storing  the  weighting  factors 
in  the  module  shift  registers.  The  resulting  weighted  data  are  retained  in 
the  module  for  subsequent  processing. 


Square  One  Function,  SQ 


The  squaring  operation  is  similar  to  the  time  window  weighting  except  that 
the  multiplier  and  multiplicand  are  the  same  and  are  already  in  the  module. 
Squaring  is  typically  an  intermediate  operation. 


Multiply  Two  Functions,  MPLY 

Again  the  process  is  similar  to  the  time  window  weighting  except  both  func- 
tions are  in  the  module.  MPLY  is  also  usually  an  intermediate  operation. 


Digital  Filter  Bank,  DFB 

Each  module  in  a DISP  is  capable  of  performing  second-order  digital 
filtering  of  the  form 


’ Xj(n) " 

all  a12 

'Xjfo-n' 

" 1 " 

= 

+ 

_X2(n)_ 

-a21  a22- 

_X2(n-l). 

0 . 

Zj(n)  = bQX1(n) 
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This  is  a recursive  filter.  The  state  X(n)  depends  only  on  the  first 
previous  state  X(n-l)  and  the  current  input  u(n).  The  output  Z^n)  is 
real  and  is  a function  of  only  X^n).  The  module  operation  in  the  filter 
mode  is  shown  in  Figure  2-3.  The  bandwidth  and  Q of  each  filter  is 
determined  by  the  values  of  the  coefficients. 

The  output  Z1(n)  can  also  be  stored  in  the  module.  Enough  storage  space 
within  the  module  is  left  to  store  the  states  of  two  other  second -order 
filters.  Thus,  the  module  can  perform  the  calculations  required  of  three 
second-order  filters  in  cascade,  thereby  simulating  a sixth-order  digital 
filter. 


COMPLEX  MODES 

The  complex  modes  consist  of  two  or  more  basic  modes.  They  are  listed 

in  Table  2 -II.  The  processing  times  shown  are  for  a 512 -point  transform. 

Since  these  modes  generally  require  some  interaction  with  a general 

purpose  digital  computer,  the  operation  times  are  for  two  different  data 

transfer  rates.  These  rates  correspond  to  two  current  16-bit  mini -computer, 

6 6 

namely  0.  286  x 10  s/sec  and  1.  43  x 10  s/sec;  1 sample  = 12  bits. 


Power  Spectrum,  PDF 

To  compute  the  power  spectrum,  the  outputs  from  the  FFT  are  squared. 
Since  the  module  output  is  a complex  number,  the  multiplication  is 
complex.  The  output  is  both  stored  and  conjugated  (Y^).  The  product 
YY*  is  a real,  positive  number.  Also,  the  power  coefficients  for  positive 
frequencies  (0,  N/2-1)  are  the  same  as  for  negative  frequencies  (N/2,  N-l). 
Thus,  only  the  positive  frequencies  need  to  be  read  out. 
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Table  2 -II.  Complex  Modes  (512  Points) 

PROCESSING  TIME  IN  MILLISECONDS  FOR 
THE  GIVEN  TRANSFER  RATES 


MODES 

4.992x  106  BIT/SEC 

20X  106  BITS/SEC 

1. 

POWER  SPECTRUM 

PDF 

1.738 

1.738 

2. 

CROSS  POWER  SPECTRUM 

XPDF 

4.992 

2.948 

3. 

AUTO  CORRELATION 

R11 

3.718 

3.718 

4. 

CROSS  CORRELATION 

R12 

4.222 

4.222 

5. 

CONVOLUTION 

H12 

4.222 

4.222 

6. 

DOUBLE  LENGTH/FFT/2 

FFT(2) 

18.304 

4.576 

7. 

QUADRUPLE  LENGTH  (FFT)2 

FFT(4) 

36.608 

9. 152 

8. 

ENERGY-TIME-FREQUENCY 

ETF 

0.936 

0.936 

(2nd  ORDER) 

9. 

FREQUENCY  TRANSLATION 

FT 

1. 738 

1.738 

s' 
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Cross-Power  Spectrum,  XPDF 

The  operations  are  the  same  except  that  two  transforms  are  required.  The 
first  transform  outputs  are  stored  in  the  module  while  performing  the  second 
transform.  Also,  the  power  coefficients  are  now  complex.  However,  only 
the  positive  frequency  terms  need  be  read  out  since  the  negative  frequency 
terms  are  complex  conjugates. 


Correlation  and  Convolution  Modes 

Correlation  and  convolution  are  performed  via  the  Fast  Fourier  Transform. 
Both  operations  require  a segment  of  N/2  zeros  adjoining  a data  segment  of 
N/2  values.  Thus,  the  data  sample  is  only  N/2  rather  than  N. 

For  correlating  two  functions  X^(k),  X2(k),  the  procedure  is 


1.  Adjoin  N/2  zeros  to  Xj(k),  Xg(k)  as 

X(k)  = X(k)  0 < k < N/2 

X(k)  = 0 N/2  < k < N 


2. 

Compute  FFT  of  Xj(k),  Xg(k)  to  give 

Y^j),  Y2(j) 

3. 

Take  Complex  Conjugate  of  Y2(j)  or. 

Y2(j)* 

4. 

Multiply  Z(j)  = Y^j)  • Y2(j)* 

5. 

Compute  FFT  * of  Z(j)  to  obtain 
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The  output  R12(k)  represents  the  correlation  over  the  interval  (-  ), 

. 2 2 ’ 

1.  e. , 


R12(L)  =i  EX(k)  X(k+L)  L=  - f.  - (^\ 


N-l 


For  auto  correlation,  Y^j)  and  its  complex  conjugate  Y^j)*  are  multiplied, 
Z(j)  = Y x(j)  ' Y2(j)*  and  transformed  to  obtain  Rn(L). 

For  convolving  two  functions  X^k),  X2(k),  the  procedure  is  similar. 

1.  Repeat  steps  1 and  2 

2.  Multiply  Z(j)  = Y : • Y2(j) 

3.  Compute  FFT  * of  Z(j)  to  obtain  V(k).  The  output  V(k) 
represents  the  convolution  over  the  interval  - N/2,  . . . , 

i.e. , V(k)  = i SX1(L)X2(k-L)  L = - ^ 

For  continuous  inputs,  correlation  is  performed  on  X.(k)  and  X0(k),  i.e., 

^ 1ST  * ™ 

X.(k)  is  N samples  while  X0(k)  has  tt  zeros  adjoined.  The  same  steps 

are  followed  as  described  above  but  only  the  first  x-  output  samples  are 

“ N 

valid.  Convolution  is  performed  similarly  by  the  last  j samples  are 
retained  (see  Reference  1 for  more  details). 


Multiple  Length  Sample  Size 

The  number  of  modules  in  a DISP  is  determined  by  sample  size  of  the  FFT 
(or  FWT).  Nevertheless,  a DISP  can  compute  an  FFT  (or  FWT)  of  sample 
sizes  either  larger  or  smaller  than  the  one  for  which  it  was  designed.  The 
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computation  for  a smaller  sample  size  requires  that  only  part  of  the 
algorithm  be  performed.  The  computation  for  larger  sample  sizes  requires 
dividing  the  sample  set  into  groups.  After  performing  an  FFT  on  each 
group,  the  resulting  outputs  are  reordered  (by  external  computer).  These 
are  also  divided  into  groups  and  a partial  transform  performed  on  each 
group.  The  flow  diagram  for  the  case  of  a double  sized  window  (2N)  is 
shown  in  Figure  2-4.  For  the  case  of  2N  there  are  two  complete  transforms 
and  two  partial  transforms.  For  the  case  of  4N  there  are  four  complete  and 
four  partial  transforms. 

The  procedure  for  a 2N  window  is  as  follows: 

1.  Perform  an  N point  transform  on  the  even  numbered  points 
and  shuffle  outputs 

2.  Repeat  (1)  on  the  odd  numbered  points 

3.  Perform  one  stage  of  an  N point  transform  on  each  half  of 
the  outputs  from  (1)  and  (2)  using  the  weighting  coefficients 
for  the  last  stage  of  a 2N  transform  and  then  shuffle  outputs 

The  procedure  for  a 4N  window  is  as  follows: 

1.  Perform  an  N point  transform  using  every  fourth  sample. 

Shuffle  outputs. 

2.  Repeat  (1)  three  times. 

3.  Perform  two  stages  of  an  N point  transform  on  each  quarter 
of  the  outputs  from  (1)  and  (2)  using  the  weighting  coefficients 
from  the  next-to-last  and  last  stages  of  a 4N  transform.  Each 
transform  output  is  one -fourth  of  the  4N  transform. 
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. Sixteen  Point  FFT  Using  an  Eight  Point  Processor 
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Energy-Time- Frequency 

For  a continuous  output  of  a filter  bank,  one  generally  wants  the  energy 
rather  than  the  filter  output  directly.  This  is  accomplished  by  squaring  the 
outputs  and  passing  through  a low  pass  filter.  Thus,  the  operations  in 
sequence  are  DFB,  SQ,  LPL. 


Frequency  Translation,  FT 

Often  it  is  desirable  to  obtain  finer  frequency  resolution  over  some  portion 
of  the  frequency  spectrum.  This  is  handled  by  the  frequency  translation 
mode.  The  procedure  is  as  follows: 

1.  Select  the  lower  and  upper  frequency  points  YL(j),  YH(j). 

At  least  four  frequency  points  should  be  included  (two 
besides  Y^(j)  and  Yjj(j). 

2.  Perform  FFT  on  window  1 to  obtain  Y(j). 

3.  Perform  (FFT)’1  on  Y(j)  within  selected  interval  and  store 
time  samples  X(k).  The  number  of  time  samples  equals 
the  number  of  Y(j)  retained. 

4.  Repeat  steps  2 and  3 until  the  number  of  time  samples 
equals  N. 

5.  Perform  an  FFT  on  the  N sample  time  function.  This 
provides  an  N sample  resolution  of  the  selected  interval. 
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It  is  noted  that  the  input /output  transfer  rates  become  limiting  in  some  modes. 

g 

At  an  effective  bit  transfer  rate  of  3.  684  x 10  bits/sec,  the  transfer  rate 

limits  the  processing  for  XPDF,  FFT(2)  and  FFT(4).  At  a rate  of  14.  736  x 
6 

10  bits/sec,  the  transfer  rate  limits  FFT(2)  and  FFT(4).  In  this  latter  case 
the  computation  time  is  only  sli^itly  less  than  the  transfer  rate. 

Also,  one  notes  that  all  modes  except  ETF  can  handle  a 50ks/sec  sampling 
rate.  Thus,  real  time  processing  can  handle  a 20  KHz  input  signal  bandwidth. 

Some  of  the  complex  modes  are  illustrated  in  Figure  2-5.  These  show  the 
repeated  application  of  the  basic  modes.  They  also  show  the  relationships 
between  sample  lengths  and  resolution. 


FUNCTIONAL  MODES 

The  basic  and  complex  modes  can  be  used  to  perform  a variety  of  signal 
processing  functions.  Some  typical  examples  are  listed  below.  Generally, 
these  require  input/output  and  other  processing  functions  in  addition  to  the 
DISP.  To  make  the  illustration  specific  we  have  assumed  two  different 
configurations  using  mini -computers.  The  DISP  would  be  under  control  of 
the  computer.  The  computer  would  also  provide  data  storage,  data  reordering, 
post -processing  and  data  display  and  output. 

The  major  factor  is  the  transfer  rate  of  the  computer.  With  direct  memory 
access  DMA,  the  rates  are: 

H316  - 0.  312  x 10  sames/sec  (16  bit) 

Supernova  SC  - 1.  25  x 10  samples/ sec  (16  bit) 
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ENERGY-TIME- FREQUENCY,  ETF 
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Figure  2-5.  Complex  Mode  Operations 
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The  DISP  outputs  one  12 -bit  word  every  13  bit  times  (1  jisec).  For  the  lower 
transfer  rate,  4 output  channels  would  be  patched  into  3 16-bit  words.  For 
the  higher  rate,  16  channels  would  be  patched  into  12  16-bit  words.  The 
resulting  effective  transfer  rates  are: 

H316  - 0.  307  x 106  samples/sec  (12  bit) 

Supernova  SC  - 1.25  x 106  samples /sec  (12  bit) 

The  minimum  time  to  transfer  a set  of  samples  is 

Sample  Transfer  Rate 

2 56  samples  512  samples  1024  samples 

H316  0.832  msec  1.664  3.328 

Supernova  SC  0.208  0.416  0.832 

Using  these  transfer  rates,  the  speeds  for  specific  applications  can  be  determined. 


Logrithmic  Frequency  Analysis,  LFA 

The  first  application  is  for  spectral  analysis  over  a wide  frequency  range.  Both 
proportional  and  logrithmic  frequency  intervals  are  available.  We  will  describe 
the  logrithmic  since  it  is  more  complex  to  implement.  The  input  is  assumed  to 
be  sampled  at  50ks/sec  and  quantized  into  12-bit  words.  Further,  each  decade 
in  frequency  will  be  sampled  separately  as  shown  in  Figure  2-6.  It  is  desired 
to  form  a time-averaged  l/3-octave  power  spectrum.  The  power  spectrum  is 
formed  in  DISP  and  the  frequency  and  time  averaging  performed  in  the  GP  com- 
puter. 
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Figure  2-6.  Logarithmic  Frequency  Analyzer 
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The  DISP  performs  the  power  spectrum  operation  on  each  window  of  512 
samples  from  the  high  speed  channel.  The  slower  data  channels  are  fed 
into  the  computer.  Every  10th  window,  the  512  samples  from  the  medium 
speed  channel  is  processed,  and  likewise  for  every  100th  window  for  the 
low  speed  channel.  The  resulting  spectrum  is  illustrated  in  Figure  2-7. 

The  frequency  coefficients  can  be  averaged  into  logrithmic  intervals. 

Two  typical  intervals,  1/3  and  1 / 1 5 octave  are  shown.  For  the  1 / 1 5 
octave,  the  first  (and  smallest)  band  contains  one  frequency  coefficient. 

The  last  band  contains  10.  For  the  1/3  octave,  there  are  five  times  as 
many  coefficients  per  band.  This  averaging  of  coefficients  over  frequency 
bands  is  performed  in  the  GP  computer.  Also,  any  time  averaging  is 
performed  in  the  GP  computer. 

If  finer  frequency  resolution  is  required,  multiple  windows  can  be  processed. 
Using  a 4 -window  mode  would  increase  the  frequency  resolution  by  four.  This 
increased  resolution  for  the  power  spectrum  is  also  shown  in  Figure  2-7. 

If  much  finer  resolution  is  required  over  some  part  of  the  spectrum,  the 
frequency  translation  mode  can  be  utilized.  Suppose  the  band  from  100  Hz 
to  112.8  Hz  is  to  be  expanded.  This  band  contains  16  frequency  coefficients 
saved  from  each  transform  of  the  512  data  window.  The  coefficients  are 
inverse  transformed  to  form  a time  sample  of  16  points.  After  32  such 
windows  (32  seconds),  the  time  sample  is  512  points.  It  is  transformed  to 
provide  a 256-point  frequency  set  from  100  Hz  to  112.  8 Hz.  The  frequency 
resolution  is  1/16  of  the  previous  Af  or  0.  05  Hz. 


Coherent  Detection  System 

The  coherent  detector  detects  the  target  and  estimates  its  position  and  velocity. 
In  the  case  of  coherent  detection,  the  transmitted  signal  rT(t)  is  reflected  from 
some  target  and  the  received  signal  s(t)  contains  range,  velocity  and  accelera- 
tion information  about  the  target.  For  narrow  band  detection  the  two  operating 
modes  are  a)  Doppler  search  and  b)  Range  search. 
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RESOLUTION  FOR  QUADRUPLE  WINDOW 

EXPANDED  RESOLUTION  USING  FREQUENCY  TRANSLATION 
Figure  2-7.  Power  Spectrum  Output  Formats 
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For  the  narrow  band  coherent  detector  in  the  Doppler  Search  Mode  (Figure  2-8) 
the  received  signal  s(t)»  is  quadrature  demodulated,  lowpass  filtered,  and  con- 
verted from  analog  to  digital  signal.  It  then  is  multiplied  by  the  reference 
transmitted  signal,  which  is  Fourier  transformed.  The  square  of  the  Fourier 
components  represent  the  ambiguity  function  for  a particular  delay  (range)  as 
a function  of  frequency  shift  (Doppler). 

For  the  narrow  band  coherent  detector  in  the  Range  Search  Mode  (Figure  2-9) 
the  received  signal  is  processed  initially  as  before  to  form  the  complex  signal, 

S (n).  The  DISP-FFT  is  used  to  Fourier  transform  512  samples  of  Sc(n). 

The  transform  S (f,  ) is  multiplied  by  the  Fourier  transform  of  the  reference 
signal  R (f,  ),  which  may  have  been  stored  in  the  DISP  premultiply  shift  regis- 
ters.  Results  are  then  processed  through  the  inverse  FFT.  The  square 
magnitude  of  the  output  represents  the  ambiguity  function  for  a particular 
Doppler  (see  Figure  2-9). 

Wideband  coherent  detection  may  require  several  references  because  of  de- 
correlation  at  large  Doppler  shifts.  For  example,  in  the  Doppler  Search 
Mode,  M reference  signals  r^(t)  with  M different  Doppler  shifts  (Figure  2-10). 
Each  reference  signal  is  multiplied  by  the  received  signal  s (n)  and  then  the 
product  is  FFT  transformed.  The  magnitude  squared  represents  the 
ambiguity  function  about  the  reference  Doppler.  Each  reference  Doppler 
and  FFT  transformation  may  be  performed  in  parallel  with  M DISP’s,  or 
sequentially  with  one  DISP  and  M reference  signals  stored  in  M shift 
registers. 


Walsh-Fourier  Signal  Representation 

The  chief  advantage  of  using  the  Walsh-Fourier  representation  is  the 
increased  speed  in  performing  the  transform.  The  Walsh-Fourier  repre- 
sentation may  be  useful,  especially  in  the  area  of  data  compression  and 
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Figure  2-9.  Narrow  Band  Coherent  Detector: 
Range  Search  Mode 
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(2) 

signal  classification'  . Application  of  the  Walsh  transform  to  obtain  the 

power  spectral  coefficients  of  the  channel  vocoder  before  transmission 

(3)  (45) 

over  a channel  has  been  noted  by  several  authors  . Other  investigators  1 * 
have  studied  the  merits  cf  the  "transformation  compression"  approach  with 
other  methods  of  data  compression,  finding  it  efficient  but  difficult  to 
implement.  Perhaps  the  Walsh  transform  with  its  simple  implementation 
in  DISP  will  make  this  method  practical. 


DISP 


f-! 


Figure  2-10.  Wide  Band  Coherent  Detector:  Doppler 

Search  Mode 
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SECTION  III 
ACCURACY 

Accuracy  is  critical  in  digital  operations:  too  few  bits  lead  to  erroneous 
results;  too  many  bits  decrease  speed  and  increase  costs.  Consequently, 
numerous  application  studies  were  made  before  selecting  12  bits  as  the 
nominal  word  length  for  DISP.  In  addition,  the  accuracy  of  DISP  operating 
as  a Fast  Fourier  Transform  (FFT)  and  as  a Digital  Filter  Bank  (DFB) 
was  evaluated  theoretically  as  well  as  experimentally.  The  experiments 
used  an  exact  simulation  of  DISP  on  a general  purpose  computer. 

FFT 

The  accuracy  analysis  of  the  fast  Fourier  transform  mode  included  both 

statistical  and  deterministic  effects.  The  statistical  analysis  evaluated  the 

(8) 

effects  of  roundoff  and  truncation.  The  theoretical'  ' values  are: 

Roundoff  Error 

NSRr  = 2 n <7  f = -2  or  f log2  N 

where 

N = 2n  is  the  sample  size 


1D-C-1 


2 

cr 

e 


error  variance 


3-2 


Truncation  Error 


NSRt  = (2n  + 81)  <7  2 


2 2‘2N  7 

For  a white  noise  input,  the  value  of  a e is  —jy  or  about  10  for  N = 2 56. 
The  noise -to -signal  ratio  from  both  sources  is,  therefore,  about  10  ~5. 

A simulation  using  a sinusoidal  input  gave  a noise -to -signal  ratio  of  2 x 10  ~5. 

Theoretical  analysis  shows  that  the  sinusoidal  input  should  produce  noise  15% 

greater  than  for  a white  noise  input.  Thus,  the  simulation  results  agree 

(6)  -5  -5 

closely  with  the  theoretical  predictions'  (2  x 10  vs.  1.  15  x 10  ).  A com- 

plete analysis  is  given  in  Reference  8. 


Dynamic  Range 

Dynamic  range  can  be  measured  two  ways.  One  is  the  ratio  of  maximum  to 
minimum  values  of  input.  This  is  the  inverse  of  the  quantization  accuracy 
or  2^  = 66  db.  (Note  that  DISP  scales  automatically  so  that  the  full  dynamic 
range  is  always  utilized.  ) 

A second  way  to  measure  dynamic  range  is  to  insert  two  signals,  A^  and  A^. 
As  Ag  is  decreased  in  magnitude,  the  error  in  its  FFT  representation  will 
increase.  This  error  was  determined  experimentally  by  introducing  an  input 
signal, 


X(t)  = A^  cos  2n  fggt  + Ag  cos  2tt f^  t 

The  ratio  of  Ag/A^^  was  varied  and  the  FFT  computed  over  all  values  ffe. 

The  resulting  deviation  in  the  estimated  value  A2  from  the  actual  value  is 
shown  in  Figure  3-1  for  12-bit  accuracy  and  in  Figure  3-2  for  16-bit  accuracy. 

The  experimental  results  for  12 -bit  words  show  that  a dynamic  range  of 
40  db  in  Aj/Ag  gives  a maximum  error  of  2.  5 db  in  estimating  A2. 
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Figure  3-2.  Percent  Error  versus  Dynamic  Range  with  Input  A* 
cos  2rrf63t  + A2  cos  2nfKt,  a 256  Sample  Window,  and 
16  Bits/Word 
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SECTION  IV 
DISP  ORGANIZATION 


SYSTEM  DESCRIPTION 

A DISP  consists  of  a control  unit  a number  of  identical  Processing  Modules 
and  2 shift  registers  permodule.  Each  module  can  process  2 samples  of  an 
FWT  or  FFT  or  can  implement  one  bandpass  filter. 

Referring  to  the  DISP  block  diagram  in  Figure  1-1,  data  inputs  are  loaded 
into  the  input  buffer  bit  serially,  with  the  real  and  the  imaginary  portions 
in  parallel.  Interconnecting  the  input  and  output  pins  of  the  Processing 
Modules  properly  allows  samples  to  flow  down  through  the  modules  of  the 
DISP  to  permit  serial-by-word  loading  and/or  moving  window  operations. 

Size  of  the  window  is  governed  by  the  number  of  load  instructions  preceeding 
a computation. 

Since  the  DISP  operates  in  parallel,  all  outputs  are  available  simultaneously 
bit  serially,  word  parallel.  These  outputs  can  be  accepted  in  this  form,  or 
can  be  stored  in  the  buffer  registers  of  each  processing  module.  If  outputs 
are  stored  internally,  output  instructions  will  feed  the  contents  of  the  imaginary 
part  of  the  word  into  the  real  buffer  register,  while  the  contents  of  the  real 
register  are  output.  Thus,  the  external  buffer  register  is  a 24-bit  serial  in/ 
parallel  out  shift  register  and  a 24-bit  holding  register.  The  number  of 
these  registers  used  determines  the  output  rate. 

Figure  1-1  shows  the  slowest  method  of  obtaining  the  outputs  since  it  uses 
only  one  output  buffer.  The  input  and  output  pins  of  the  buffer  registers  can 
be  properly  connected  to  feed  the  computed  outputs  up  through  the  modules 
into  the  external  buffer  in  unshuffled  order  for  either  the  FFT  or  the  FWT. 

If  the  unit  interfacing  with  DISP  is  capable  of  high-speed  operation. 
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more  external  buffers  can  be  added.  For  example,  a computer  which  can 
multiplex  24  bit  I/O  transfers  at  a rate  of  1 MHz  could  use  24  output  buffers. 
Unloading  the  results  of  a complex  2 56-point  FFT  would  then  require  512/24 
or  22  word  times,  or  2 86  p.  sec.  Since  this  is  less  than  the  computation  time 
of  the  FFT,  1118  n sec,  the  FFT  could  be  run  at  top  speed.  The  output  would 
always  be  completed  before  the  next  set  of  data  was  ready. 

A 2 56-point  FWT  requires  only  9 word  times,  and  the  laat  word  loads  new 
data  into  the  internal  buffers.  Thus,  only  8 output  instructions  could  be  per- 
formed during  the  next  computation.  The  output  of  this  computation  would 
have  to  be  delayed  while  the  remaining  14  output  instructions  are  performed. 

The  fixed  interconnections  of  the  processing  modules  are  shown  in  Figure  4-1, 
for  the  FFT  algorithm  of  Figure  2-1.  Four  modules  are  required  as  well  as 
8 shift  registers.  The  8 shift  registers  hold  the  two  words  required  for  the 
premultiplications,  and  the  three  weighting  coefficients  required  for  each 
module.  Registers  one  through  four  hold  real  components,  while  five  through 
eight  contain  imaginary  components. 

th 

Each  processing  module  receives  two  complex  inputs  representing  the  i 
and  the  i + N/2  data  samples.  Each  module  is  identical  in  construction  and 
the  arithmetic  operations  are  performed  serially,  bit  by  bit,  with  all  modules 
computing  in  parallel. 

Each  module  performs  the  computations  indicated  by  two  rows  of  the  transform 
algorithm.  As  seen  from  Figures  2-1  and  4-1  module  number  1 receives 
inputs  Fq  and  F^  and  forms  the  sum  (S)  and  difference  (D)  operations  of  the 
top  two  rows  of  the  algorithm.  Module  2 receives  inputs  F^  and  F5  and 
computes  the  operations  of  the  next  two  rows,  etc.  Thus,  after  one  iteration 
time,  the  outputs  of  the  modules  represent  the  nodes  in  column  3 of  Figure 
2-1.  During  subsequent  iteration  times  the  outputs  of  columns  2 and  1 are 
formed.  Thus,  with  all  N/2  modules,  each  operating  in  parallel,  an  entire 
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column  is  computed  at  once.  The  number  of  iteration  times  is  k,  where  the 

]£ 

sample  size  N = 2 . 

The  control  unit  contains  in  the  memory  all  programs  required  by  the  DISP. 
When  a given  computation  is  required,  a section  of  this  memory  is  read  out 
sequentially.  Each  memory  word  is  decoded  into  an  instruction  and  distributed 
to  each  of  the  processing  modules.  The  control  unit  also  sends  the  proper 
timing  information  to  each  module,  causing  each  module  to  execute  this 
instruction. 


PROCESSING  MODULE  DESCRIPTION 

In  the  processing  module  (Figure  4-2)  the  logic  gates  interconnecting  the 
various  module  elements  are  not  shown  because  of  their  complexity.  These 
gates  are  defined  by  logic  gate- enable  equations  (Appendix  C)  written  using 

the  notation  shown  in  Figure  4 -2  (e.  g. , IAlR  represents  the  input  to  the 

R ^ 

register  ).  The  notation  is  identified  in  Table  4-1. 

Table  4-1.  Notation  for  DISP  Module 
R 

A Intermediate  register  - real  word 

A Adder 

B Output  register  - real  word 

C Complementer 

Dr  Input  register  - real  word 

TD 

Txx  Premultiply  register  - real  word 

IA  Input  to  register  AR 

E^^  Enable  complementer  1 

OaR  Output  from  register  A^ 

OV  Overflow 

RR  Reference  register  - real  word 
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Figure  4-2.  DISP  Module  Block  Diagram 
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The  module  contains  18  12 -bit  shift  registers  designated  as  A,  B,  D,  T,  and 
R,  as  well  as  6 conditional  complementers  designated  C^g  (Figure  4 -3). 
Figure  4-4  presents  serial  adders  and  A while  Figure  4-5  shows  adders 
Ag  and  A^.  These  adders  are  designed  to  detect  and  correct  all  arithmetic 
overflow  which  may  occur  during  computation.  A detailed  explanation  of 
their  operation  is  presented  in  Appendix  D. 


¥ o 


Ci 


Figure4-3.  Complementer 

The  complexity  of  the  processing  module  in  equivalent  AND/OR  gates  is 
shown  in  Table  4- II.  When  implemented  in  MOS  technology,  approximately 
3.  5 devices  are  required  for  the  average  gate. 

Thus,  these  891  logic  gates  would  require  approximately  3100  MOS  devices. 
Two  builders  of  semiconductor  devices  have  assured  Honeywell  that  this 
module  can  be  fabricated  on  one  low  threshold  (bipolar  compatible)  LSIC. 
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13  + T14 


Figure  4-4.  First  Adder  with  Overflow  Detection 
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Figure  4-5.  Second  Adder  with  Overflow  Detection 
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Table  4-II.  Equivalent  AND/OR  Gates  of  the  Processing  Module 


Quantity 

Description 

Estimated 

Gates 

18 

12 -Bit  Shift  Registers 

432 

4 

Adders  with  Overflow  Detection 

156 

6 

Complementers 

48 

3 

Flip-Flops  and  Latches 

12 

1 

2 3 -Bit  Shift  Register 

46 

1 

2 3 -Bit  Latch 

46 

Miscellaneous  Gates  (Appendix  B) 

151 

Total 

891 

The  module  can  be  housed  in  a 40-pin  package,  using  13  pins  for  outputs  and 
2 6 for  inputs. 

The  module  can  perform  23  basic  subinstructions  (see  Appendix  E).  A number 
of  these  subinstructions  are  enabled  during  a given  word  time  to  form  an 
instruction.  Subinstructions  perform  the  following  functions: 

• Add  for  real  multiply  (AF) 

• Add  for  complex  multiply  (AG) 

• Add  for  forming  sum  and  difference  (ADAW) 

• Add  for  forming  sum  and  difference  of  (A)  and  the  complex 
conjugate  of  (B)  (ADAW) 

• Load  Reference  and  Data  register  (LDR  and  LDD) 

• Load  buffer  registers  (LDB) 

• Output  buffer  registers  (OB) 

• Exchange  contents  of  A registers  (EXA) 
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• Various  transfers  of  Data  to  A registers 

• Various  transfers  of  Data  to  T registers 

• Various  transfers  of  Data  to  D registers 

Instructions  are  received  serially  by  the  processing  modules  into  the  2 3 -bit 
shift  register  of  Figure  4-2.  After  this  register  is  loaded,  the  data  is 
transfered  in  parallel  to  the  23-bit  latch.  Each  bit  of  this  latch  corresponds 
to  one  of  the  subinstructions  which  may  be  included  in  the  instruction.  An 
instruction  is  thus  represented  by  the  subinstructions  which  have  a logic 
"one"  stored  in  the  23  bit  latch.  While  one  instruction  is  being  executed, 
another  is  being  entered  serially  into  the  23-bit  shift  register  of  each  proces- 
sing module  from  the  control  unit.  At  the  end  of  each  word  time  the  contents 
of  this  register  is  gated  into  the  latch  where  it  presents  the  proper  gate 
enables  for  the  next  word.  Note  that  the  logic- enable  equations  of  Appendix  C 
include  the  appropriate  subinstructions  as  gate  inputs. 

Prior  to  modifications  to  expand  the  capabilities  of  DISP,  a complete  logic 
level  simulation  was  performed  on  the  processing  module  design.  The 
simulation  verified  all  logic- enable  equations,  adder  operation,  and  overflow 
detection  and  correction  within  the  module.  The  functional  test  written  for 
the  module  was  also  verified  (see  Appendix  F).  Subsequent  changes  to  DISP 
leaves  the  design  approximately  95%  verified  by  simulation.  The  functional 
test  will  also  have  to  be  expanded  to  check  the  new  instruction  LDTR2. 


CONTROL  UNIT  DESCRIPTION 

The  processor  control  unit  (Figure  4-6)  is  not  yet  designed  in  detail.  The 
read-only  memory  will  contain  the  coded  instructions  of  all  programs  which 
can  be  computed  by  the  DISP.  The  control  programs  required  by  DISP 
(Appendix  G)  consist  of  instructions  made  up  of  various  combinations  of  the 
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Figure  4-6.  Processor  Control 
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23  basic  subinstructions  listed  in  Appendix  E.  The  number  of  unique  instruc- 
tions used  in  these  programs  is  found  to  be  the  2 6 shown  in  Table  4-m. 

The  only  subinstruction  not  included  in  any  of  these  instructions  is  the  OB 
instruction  (Output  Buffer).  It  is  planned  that  the  program  store  will  consist 
of  a read-only  memory  of  6-bit  words.  Five  bits  will  be  used  to  encode  the 
2 6 unique  instructions,  and  the  sixth  bit  will  be  used  for  OB  subinstruction. 

The  number  of  storage  words  required  will  be  a function  of  the  number  of 
processing  modules  in  the  DISP,  and  the  number  of  external  buffers.  In  any 
case,  this  memory  should  not  exceed  512  words. 

The  instruction  decoder  decodes  the  5-bit  memory  word  into  the  proper  set 
of  subinstructions  and  loads  them  into  the  shift  register.  This  register  then 
transfers  this  instruction  to  all  the  processing  modules  simultaneously. 

The  control  unit  will  also  include  clocks,  drivers,  counters  and  other  logic 
required  to  generate  other  outputs  to  the  modules.  The  processor  control 
will  also  detect  overflow  in  any  module,  and  notify  all  modules  that  such  has 
occurred.  A count  of  overflow  occurances  is  maintained  during  a computation 
such  that  the  proper  scale  factor  can  be  applied  to  the  output.  Upon  notification 
that  overflow  has  occurred,  each  module  will  scale  its  data  down  by  one  half. 

The  operation  of  DISP  is  determined  by  a program- select  input  which  defines 
the  area  of  program  storage  containing  the  instructions  for  the  desired  com- 
putation. This  block  of  memory  is  sequentialy  read  out  from  the  memory, 
decoded  and  transmitted  to  all  processing  modules.  Interrupts  allow  DISP  to 
function  in  a system  containing  other  devices. 
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Table  4 -III,  Control  Instructions 


Unique 

Instruction 

Arith 

A Reg 

T Reg 

R Reg 

Shift 

D Reg 

B Reg 

1 

Ag 

LDA2 

2 

4 

LDA2 

LDT3 

SHR1 

3 

EXA 

4 

Ag 

LDA2 

SHR1 

5 

Ad 

LDA5 

LDT1 

6 

Ad 

LDT2 

7 

LDA7 

LDD 

8 

Ad 

LDB 

9 

Ad 

LDA6,  LDA5 

10 

LDA7 

11 

Ad 

LDD 

LDB 

12 

Af 

LDA3 

13 

Af 

LDA4 

SHR1 

14 

Af 

LDA3 

SHR1 

15 

Ad 

LDA4,  LDA8 

16 

Ad 

LDT5 

17 

Ad 

18 

Ag 

LDA2 

LDT4 

SHR1 

19 

LDT1 

LDR 

20 

LDR 

21 

Ad 

LDD1 

22 

Ad 

LDA4 

23 

LDR 

LDD 

24 

LDD 

25 

LDT2 

26 

LDTR2 
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APPENDIX  A 

COMPARING  DISP  WITH  OTHER  IMPLEMENTATIONS 


The  following  tables  were  reproduced  from  the  article  in  the  June  1969 
Transactions  of  IEEE  Audio  and  Electroacoustics  by  Glen  Bergland,  "Fast 
Fourier  Transform  Implementations,  A Survey",  pp.  109-117.  The  DISP 
characteristics  are  shown  generally  for  a 128-module  processor.  Exceptions 
are:  For  the  case  of  maximum  number  of  samples,  1024  modules  are  assumed. 
The  maximum  throughput  for  N = 1024  assumes  512  modules  and  a clock  rate 
of  1.4  MHz. 
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TABLE  A1 

Bell  Telephone 
Labs.  M 

FFTP 

Computer  Signal 
Processors  1*1 

CSS-3 

Computer  Signal 
Processors  1*1 

CSP-30 

Control  Data 
Corp.  1*1 

FFT  Processor 

Dept,  of 
Defense  M 

COMP-H 

Emerson  M 
Digital 

Signal  Processor 

Emerson  I51 
MM  DSP 

lBMt'l 

Array  Proc.  2938-1 

lBMt'l 

Array  Proc.  2938-2 

M.I.T.  Lincoln 
Labs.  PI 

FDP 

McCullough 

Engrg.W 

Continuous  Proc.  I 

DISP 

| 

i 

i 

i 

i 

[ 

McCullough 

Eugrg.t*! 

Continuous  Proc  II 

Raytheon  1*1 
SRFFT 

Stanford 
Res.  lnst.l**l 

Project  CRANE 

*2? 

*5 

g 0- 

s*  % 

*C3 

'£ 

aJ  « 

> C-< 

-c  CO 

CO  < 

Texas 

Insti.t1*) 

System  I 

Texas 

Iustr.t1*) 

j 

j System  II 

Texas 

Instr.l**! 

System  III 

2* 
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J 1 

H 2 

a 

*« 

oS  55 
Q 2 
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a -o 

B ® 
H S 

Washington 
University  tMl 

MM  FFT 

Washington 
University  luI 

MM  FFT 



a 

1 

0 
a 
enj 

1 £ 

? tu 

a* 

i 

-C 

C (S 

| £ 
^ u- 

Westinghousei“l 

FFT-3 

l 

DESIGN  STATUS 

Status 

• 

Paper  Design 

X 

X 

? . 

X 

X 

X 

X 

X 

X 

X 

Breadboard  Model 

1-69 

7-69 

X 

X 

X 

X 

» 

X 

Date  Operational  (Past  or  Future) 

5-67 

3-69 

9-69<3> 

3-69 

7-69 

9-68<»> 

12-69(10) 

7-68 

8-68 

9-70 

6-69 

i yr1 

3-69 

1-68 

3-68 

2-69 

12 — 68<*T> 

3-69  <S1) 

12-69<“> 

10-67 

6-69 

7-69 

7-69 

12-69 

6-70 

Objective 

Research  Model 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

Built  for  a Specific  Application 

X 

X 

X 

. 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

Production  Model 

X 

x 

X 

X 

X 

X 

X 

X 

(to) 

(W 

Commercially  Available 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

<«0) 

<eo> 

Application 

Off-Line  Signal  Processing 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

Real-Time  Signal  Processing 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

General  Scientific 

X 

X 

X 

X 

X 

X 

x 

X 

X 

ARCHITECTURE 

- 

Classification 

Stand  Alone 

1 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

G P Computer  Attachment 

X 

X 

X 

X<‘» 

X<n> 

X 

X 

. 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

G P Computer  Modification 

x« 

. 

Other 

X 

X 

X 

Structure 

Sequential 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

Cascade 

« 

X 

X 

X 

X 

X 

Parallel-Iterative 

X 

X 

X 

X 

X 

Array 

. 

Other 

, 

X<“> 

X (,8> 

X 

X<48> 

X(«) 

x<«> 

X(5S> 

FUNCTIONAL  CHARACTERISTICS 

Arithmetic  Unit 

Static  (or  Combinatorial) 
Multiplier 

. 

X 

X 

X 

X 

X 

X 

Parallel-Iterative  Multiplier 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

Serial  Multiplier 

X 

Real  Multiplier 

X 

. 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

Complex  Multiplier 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

Log  and  Log”1  Conversions 

X 

X 

Muliplicand  (bits) 

12 

16 

16 

16 

12 

6 

6 

32<u> 

32<i» 

18 

9 

12 

15 

12 

16 

12 

i 6(43) 

18 

18 

12 

8 

12 

24 

12 

10 

10 

10 

Multiplier  (bits) 

12 

16 

16 

16 

12 

6 

6 

32«2> 

3202) 

18 

9 

12 

15 

12 

16 

12 

16<«) 

18 

18 

12 

8 

12 

24 

12 

’10 

10 

10 

Product  (bits) 

12 

32 

32 

16 

12 

7 

7 

32<® 

3202) 

36 

9 

12 

IS 

12 

31 

12 

16<«) 

18 

18 

12 

8/15 

12/23 

24 

12 

10<«> 

10<«> 

10<“> 

Truncated  Result 

X 

X 

X 

X 

X 

X 

x<*» 

X 

X 

X 

X 

Rounded  Result 

x ; 

X 

X 

X 

X 

X 

pr 

X 

X(4« 

X<«) 

xo» 

X 

X 

i 

1 

Floating  Point  Numbers 

! 

X 

X 

.1 

X 

Automatic  Scaling 

X 

1 

- DISP  could  be  built  and  tested  in  a one  year  program 
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60 
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3 
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3 
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CL 
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Fixed  Point  Numbers 

X 

X 

X 

X 

X 

X<“» 

x<w 

X 

— 1 

X 

Fixed  Point  with  Common 
Exponent 

X 

X 

X 

One’s  Complement 

X 

II 

Two’s  Complement 

X 

X 

X 

X 

X 

X 

x<“> 

X<«) 

X 

BUS 

X 

Sign  Magnitude 

■ I 

II 

X<»> 

X<“> 
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1 
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4 

1 

l 

1 

MM 
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1 

1 

1 

N 

Number  of  Arithmetic  Units 

1 

1 

1 

1 

1 

1 

1 

1 

4 

1 

N 

Technology  Used 

DTL 

DTL 

TTL-MSI 

TTL 

TTL 

TTL 

TTL 

DTL 
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TABLE  A1  (Continued) 
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PERFORMANCE  CHARACTERISTICS 


Timing 


Execution  Time  for  W=  1024 
(ms)tM 


Execution  Time  = k N logi  N 
(ns )W;kca 


Throughput  for  W=1024lBl 

Max  (Complex  Samples/Second)  24  000 


Execution  Time 

One  Radix-2  Basic  Operation  (#is) 


One  Radix-4  Basic  Operation  G»s) 


Precision 


Bits  Input  (Fixed  Point) 


Bits  Output  (Fixed  Point) 


Bits  Mantissa  (Floating  Point) 


Ratio  of  rms  Error/rms  Result 
for  Random  Number  Input 


FUNCTIONS  PERFORMED  (H— Hardware; 
HS — Hardware  Aided  by  Software; 
S — Software  Only) 


Fast  Fourier  Transform 


Inverse  Fast  Fourier  Transform 


Convolution 

Correlation 


Weighting  Input  by  an  Arbitrary 
Data  Window 


Weighting  Input  by  a Specific 
Data  Window 


Squaring  to  Form  Power  Spectrum 


Recursive  Filtering 


Vector  Multiplications 


Multidimensional  FFT 


Spectral  Averaging 


Convolutional  Form  of  FFT 
(Bluestein) 


Expanded  Resolution  in  Any 
Range 


Diagnostic  Tests 


Other 
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A5 


BLE  A1  (Concluded) 

— 
Bell  Telephone 
Labs.  I'! 

FFTP 

Computer  Signal 
Processors  M 

CSS-3 

Computer  Signal 
Processors  1*1 

CSP-30 

Control  Data 
Corp.  M 

FFT  Processor 

Dept,  of 
Defense  M 

COMP-II 

Emerson!*! 

Digital 

Signal  Processor 

Emerson  W 
MM  DSP 

1BMW 

Array  Proc.  2938-1 

IBM  1*1 

Array  Proc.  2938-2 

M.I.T.  Lincoln 
Labs.  01 

FDP 

McCullough 

Engrg.I*) 

Continuous  Proc.  I 

DISP 

i . 

i 8 
- £ 

9 3 

is  § 

1 V 1 

’ p § 

j;5  UJ  U 

1 6 
a % 

Stanford 
Res.  Inst.!'*! 

Project  CRANE 

sT 

.2 

1 0. 

1 % 

• 

*oa 

•a 

? CU 
£ < 

Texas 

Instr.l“l 

System  I 

Texas 

Instr.t“l 

System  II 

Texas 

Instr.l1’! 

System  III 

2? 

5 © 
S r 
1 i 

H £ 

Time/Data  (“I 
Model  90 

Washington 
University  lMl 

MM  FFT 

Washington 

Universilyl“l 

MM  FFT 

3* 

% 

0 
•£ 

c _ 

1 £ 

9 u. 

Westinghousel1*! 

FFT-2 

sT 

i 

9 

JZ 

.a 

I l 

SYSTEM  HARDWARE  FEATURES 

| 

Maximum  Value  of  N Processed 

8192 

8192 

16  384 

4096 

1024 

64 

16  384 

32  768<® 

32  768®> 

512®> 

16 

2048 

* 1*124 

512®» 

4096®) 

1024 

65  536 

1024 

4096 

4096 

1001 

2048 

4096 

4096 

2048 

2048 

2048(M> 

. 

Internal  Buffer  Size  (Words) 

8192 

32  768® 

65  536® 

8192 

8192 

<*» 

(10) 

2048 

32 

mm 

4096 

2048 

4096 
-65  536 

16  384 

16  384 

16  384 

4096 

2048 

4096 

4096 

2048 

2048 

2048<® 

Internal  Word  Size  (Bits) 

24 

16 

16 

16 

48 

6/18 

6/18 

(10) 

(20) 

18 

18 

12 

■i: 

12 

16 

12 

8-20 

24 

24 

24 

8/18 

12 

24 

12 

10 

10 

10 

Multiplexed  I/O  Channels 

8-64 

8-64 

2<8> 

6 

2 

q 

* 

Hh 

8 

16 

. 

(41) 

<«) 

40 

3 

8 

2 

2 

(•« 

(81) 

A/D  Bits  Converted 

9-14 

8-15 

8-15 

12 

6 

7-8 

9 

5 

9 

14 

<«> 

(«) 

8 

8 

8 

8 

10 

(81) 

(81) 

10 

10 

10 

A/D  Samples/Second  (Maximum) 

20  000 

BEES 

| 

. (XX  000®) 

1 000  000 

1 660  000 

20  000 

100  000 

(81) 

(81) 

250  000 

250  000 

2 000  000 

D/A  Bits  Converted 

16 

8-15 

8-15 

8 

3 

6 

9 

5 

10 

8 

(«> 

(«> 

10®> 

10(58) 

(81) 

(81) 

10 

10 

10 

Volume  of  Processor  (ftJ) 

3 

24® 

25®  ! 52 

50 

3 

3 

153 

153 

.05  R 

24 

25 

50 

65 

6 

10 

10 

4 

1 

Weight  (lb) 

350 

750 

1000 

75 

75 

3450 

3450 

5R 

250 

500 

1000 

1400 

400 

heavy 

heavy 

Max.  Ambient  Temp.  (°F) 

140 

120 

120 

167 

150 

150 

90 

90 

140 

100 

120 

110 

110 

105 

Min.  Ambient  Temp.  (°F) 

32 

40 

40 

32 

0 

0 

60 

60 

40 

40 

50 

50 

60 

Circuit  Count  (Equivalent  Gates) 

4500 

7000 

5000 

5000 

1 28,000R 

3100®) 

65  000 

48  000 

12  500 

5400 

10  0001® 

10  0Q0®> 

Power  Consumption  (Watts) 

1200 

175 

150 

150 

9500 

9500 

— 

40R 

750 

1000 

4000 

1250 

500 

3000 

2500 

COST 

t 

Cents/1024  Point  Complex 
Transform  I°1 

0.04* 

0.08* 

0.01* 

0.01* 

o.o56*<® 

0.034^2l> 

0.G015* 

i 

£ 

0.03* 

(42) 

<«) 

(41) 

(SI) 

(«) 

(42) 

(42) 

Monthly  Rental,  Processor 

$9810®) 

i 

n r 

(42) 

(42) 

to 

(42) 

Purchase  Price,  Processor  Only 

$85  000 

$356  550®) 

$397  900®) 

i 

$100  0001“) 

(«) 

(42) 

(42) 

» 

(43) 

(41) 

Purchase  Price,  Entire  System 

$45  000 

(5) 

$100  000 

(23) 

«» 

1 

$125  000 

(42) 

(42) 

$66  925 

$45  000 

$50  000 

$40  000 

(42) 

(42) 

(42) 

Approximate  1968  Parts  Cost  of 
Processor  (If  machine  is  not 
commercially  available) 

$20  000 

$100  000 

$4<p 

i 

\ 

\ $21!  000 

(42) 

(42) 

<«> 

(42) 

(42) 

Monthly  Rental  Entire  System 

<*» 

<»> 

, 

\ 



(42) 

(42) 

(«2) 

(42) 

(42) 

* Variable 
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